Effectiveness of Colorectal Cancer (CRC) Screening on All-Cause and CRC-Specific Mortality Reduction: A Systematic Review and Meta-Analysis

Simple Summary Colorectal cancer (CRC) screening is one of the most effective measures to prevent CRC resulting in a decrease in CRC mortality. Mortality reduction (MR) from CRC screening was estimated based on large-scale randomized control trials (RCTs) as well as in model studies, as there is a wide range on CRC-specific MR and a lack of estimates of all-cause MR. We found that biennial FIT, gFOBT, single/5-yearly FS, and 10-yearly colonoscopy screenings reduced CRC-specific mortality significantly, and 10-yearly colonoscopy is the most effective with a mortality reduction of 73%. The effectiveness of screening increases at younger screening initiation ages and higher adherences. Our findings also suggest that adherence is an important factor in CRC-specific mortality and is an explanation for discrepancy in thus far published pooled estimates. Abstract (1) Background: The aim of this study was to pool and compare all-cause and colorectal cancer (CRC) specific mortality reduction of CRC screening in randomized control trials (RCTs) and simulation models, and to determine factors that influence screening effectiveness. (2) Methods: PubMed, Embase, Web of Science and Cochrane library were searched for eligible studies. Multi-use simulation models or RCTs that compared the mortality of CRC screening with no screening in general population were included. CRC-specific and all-cause mortality rate ratios and 95% confidence intervals were calculated by a bivariate random model. (3) Results: 10 RCTs and 47 model studies were retrieved. The pooled CRC-specific mortality rate ratios in RCTs were 0.88 (0.80, 0.96) and 0.76 (0.68, 0.84) for guaiac-based fecal occult blood tests (gFOBT) and single flexible sigmoidoscopy (FS) screening, respectively. For the model studies, the rate ratios were 0.45 (0.39, 0.51) for biennial fecal immunochemical tests (FIT), 0.31 (0.28, 0.34) for biennial gFOBT, 0.61 (0.53, 0.72) for single FS, 0.27 (0.21, 0.35) for 10-yearly colonoscopy, and 0.35 (0.29, 0.42) for 5-yearly FS. The CRC-specific mortality reduction of gFOBT increased with higher adherence in both studies (RCT: 0.78 (0.68, 0.89) vs. 0.92 (0.87, 0.98), model: 0.30 (0.28, 0.33) vs. 0.92 (0.51, 1.63)). Model studies showed a 0.62–1.1% all-cause mortality reduction with single FS screening. (4) Conclusions: Based on RCTs and model studies, biennial FIT/gFOBT, single and 5-yearly FS, and 10-yearly colonoscopy screening significantly reduces CRC-specific mortality. The model estimates are much higher than in RCTs, because the simulated biennial gFOBT assumes higher adherence. The effectiveness of screening increases at younger screening initiation ages and higher adherences.


Introduction
The incidence and mortality of colorectal cancer (CRC) accounts for approximately 10% of all cancers worldwide, with an estimated 1.93 million new cases diagnosed and 0.94 million deaths in 2020 [1,2]. The 5-year CRC survival in 2014 was over 60% in high-income countries, and less than 50% in South American and Asian countries [3,4]. The majority of CRC arises from precursor lesions in the classic pathway with the most common lesions being adenomas and serrated pathways with polypus serrated lesions [4,5]. Usually, it takes 10-15 years for these precursor lesions to progress to CRC [6,7]. CRC screening is one of the most effective measures to prevent CRC resulting in decrease in CRC mortality [4,8]. As such, CRC screening is recommended by the World Health Organization (WHO) and has been implemented in several countries [4,9]. Biannual FIT for people under 75 years of age is the most common screening scenario in countries where population-based CRC screening have been implemented [4].
The mortality reduction (MR), life years gained (LYG) and quality-adjusted life year (QALY) gained from CRC screening were evaluated in large-scaled randomized control trials (RCTs) and in model studies. Five RCTs with guaiac-based fecal occult blood tests (gFOBT) in 765,685 participants and four RCTs with flexible sigmoidoscopy (FS) in 458,022 participants on CRC screening have been reported in the CRC handbook of the International Agency for Research on Cancer [4]. In addition, several models have been widely used to evaluate CRC screening scenarios efficiently and economically [10][11][12][13][14].
In conclusion, there is uncertainty over CRC-specific MRs and lack of estimates of all-cause MR in the general population due to CRC screening, and the model studies tend to give higher estimates than RCTs for disease-specific mortalities. Therefore, in this systematic review and meta-analysis, we aim to synthesize and compare the effectiveness of different CRC screening interventions in the general population on all-cause and CRCspecific MR compared with no screening in RCTs and simulation models. In addition, we aim to evaluate the factors that influence screening effectiveness to determine how CRC screening could be improved.

Materials and Methods
We registered a predefined protocol of this study in the International Prospective Registry of Systematic Reviews (PROSPERO registration number: CRD42021270887). This systematic review and meta-analysis followed the Preferred Reporting Items for Systematic Review and Meta-Analysis Protocols (PRISMA), 2020 statement [45].

Data Sources and Search Strategies
We conducted a systematic literature search in PubMed, Embase, Web of Science and Cochrane library for published RCT studies (1 January 2006 to 31 July 2022) and model stud- ies (1 January 2016 to 31 July 2022) with the following keywords: ("randomized controlled trials") for RCTs and ("computer simulation" or "models" or "modelling" or "Markov chain") for simulation models, ("mortality" or "cost-benefit analysis" or "effectiveness" or "life-year"), ("early detection of cancer" or "mass screening" or "fecal immunochemical test" or "fecal occult blood test" or "colonoscopy" or "sigmoidoscopy"), and ("colorectal cancer" or "bowel cancer" or "colon cancer" or "rectum cancer"). The keywords retrieving RCTs were used for the Cochrane library since this database only includes trial studies. Detailed search strategies for all databases are shown in Supplementary Materials Table S1.

Study Selection and Data Extraction
Two reviewers (SZ, JS) independently screened the potentially relevant studies based on eligibility criteria and extracted data from included studies. A study was eligible for inclusion if the following criteria were met: (1) multi-use simulation model or the latest publication of RCT compared commonly used CRC screening scenarios with no screening in general population; (2) original study published in English with outcomes on survival, death number, and CRC-specific or all-cause MR by CRC screening. Studies published as conference abstract, editorial, review, research protocol, study design, or implementation report of screening program without original data were excluded. For a detailed overview of the inclusion and exclusion criteria see Supplementary Materials Box S1.
Study information, country/population, screening scenario, adherence to screening, screening time, and follow-up time were extracted for both study types. Screening program, number of participants and person-years of observation in screening and control group, all-cause/CRC death number, and compliance-adjusted outcomes were obtained for RCTs; model used, and all-cause/CRC mortality in screening and no screening scenario for model studies.

Quality Assessment for RCTs and Simulation Models
For the RCT studies, we applied the revised Cochrane risk-of-bias tool for randomized trials (RoB2) [46]. There are five domains in this tool, including randomization process, deviations from intended interventions, missing outcome data, measurement of outcome, and selection of reported results. We classified the risk of bias on each domain and study as "low", "high", or "some concerns".
For the simulation models, we selected the study with the most complete description to evaluate model quality. A qualitative assessment framework included modelling approach, model parameters, transparency of data sources/assumptions, and external validation to assess the overall risk of bias (Supplementary Materials Box S2) [47]. When there were two or more items that did not provide a clear description, high risk of bias was assessed; otherwise, low risk. The two reviewers resolved disagreements on study selection, data extraction, and assessments by discussion and further review, or arbitrage by a third author (GdB).

Data Synthesis and Analysis
Data were synthesized and analyzed separately according to screening interventions (gFOBT or FS) in RCT studies. The primary outcomes were all-cause and CRC-specific MR, as measured by calculating the mortality rate ratio between screening and no screening scenarios. We used the original data in a pooled analysis of an intention to screen analysis, which included all individuals as randomized. Because of various follow-up times, observed person-years were extracted to calculate mortality density. RCT studies providing compliance-adjusted results were included in the compliance-adjusted analysis. Rate ratio, hazard ratio, and relative risk were assumed to be similar in the case of large sample sizes and were pooled directly. Regarding the model studies, we summarized data by screening scenarios and synthesized the commonly used global and Dutch scenarios.
Due to differences in the population, trial procedures, model assumptions, and interventions of the included RCTs and simulation models, the expected effect value of these studies were not identical. To ensure that all effects were represented in the pooled effects, and not overly influenced by one study, a bivariate random model was applied to estimate pooled all-cause or CRC-specific mortality rate ratio with 95% CI. The no screening group/scenario was the reference. We then qualitatively synthesized all-cause MRs in model studies.
Meta regression was applied before subgroup analyses to explore possible sources of heterogeneity (I2 ≥ 50%) when there were more than 10 studies/scenarios. Our subgroup analyses followed if heterogeneity existed or when we observed differences in factors affecting screening effectiveness among the pooled studies. We classified subgroups by risk of bias, adherence, screening initiation age, population, model, or screening scenario and assessed publication bias by funnel plots or Egger's test. A two-sided p < 0.05 was considered statistically significant. Statistical analysis was performed using meta package in R 3.6.1 and Review Manager (Version 5.4.1, The Cochrane Collaboration, 2020, London, UK). large sample sizes and were pooled directly. Regarding the model studies, we summarized data by screening scenarios and synthesized the commonly used global and Dutch scenarios.

Literature Search
Due to differences in the population, trial procedures, model assumptions, and interventions of the included RCTs and simulation models, the expected effect value of these studies were not identical. To ensure that all effects were represented in the pooled effects, and not overly influenced by one study, a bivariate random model was applied to estimate pooled all-cause or CRC-specific mortality rate ratio with 95% CI. The no screening group/scenario was the reference. We then qualitatively synthesized all-cause MRs in model studies.
Meta regression was applied before subgroup analyses to explore possible sources of heterogeneity (I2 ≥ 50%) when there were more than 10 studies/scenarios. Our subgroup analyses followed if heterogeneity existed or when we observed differences in factors affecting screening effectiveness among the pooled studies. We classified subgroups by risk of bias, adherence, screening initiation age, population, model, or screening scenario and assessed publication bias by funnel plots or Egger's test. A two-sided p < 0.05 was considered statistically significant. Statistical analysis was performed using meta package in R 3.6.1 and Review Manager (Version 5.4.1, The Cochrane Collaboration, 2020, London, United Kingdom).

Quality Assessment
Concerning deviations from intended interventions, five RCTs [48][49][50][51][52] showed some concerns on risk of bias because blind methods were not used, and the other five studies [53][54][55][56][57] were evaluated as high risk because of the lack of blind methods and an appropriate analysis to estimate effect of adhering to intervention. The study of Thiis-Evensen et al. was deemed to have some concerns on randomization process due to small sample size [57]. Overall, five studies were considered as low risk, four studies as some concerns, and one study as high risk of bias (Supplementary Materials Figure S1).

Study Characteristics
Five RCTs used gFOBT and the other five RCTs used FS as CRC screening interventions (Table 1). There were four European trials and one US trial with gFOBT screening [49,51,52,54,56]. The total number of participants ranged from 46,551 in the US trial to 360,492 in the Finland study. Adherence rates varied from 57.0% to 90.0%. The RCTs with FS screening consisted of four European trials and one US trial [48,50,53,55,57]. Among these trials, the total number of participants ranged from 799 to 170,034. The adherence rates ranged from 57.8% to 86.6%.

Subgroup Analysis
Subgroup analysis by adherence showed that there was a significant difference between the ≥70% and <70% adherence groups, and CRC-specific MR by FOBT increased with higher adherence  Table S3).

Subgroup Analysis
There were heterogeneities among studies with 55-75 years 10-yearly colonoscopy on CRC-specific MR (I2 = 55%, p < 0.001), and meta regression showed that adherence was the main source for heterogeneity (Supplementary Materials Figure S4 and Table S10).
The all-cause MR was presented in two studies. In the one from Norway, the results of the 3% CRC risk population were selected [20,76]. A total of 1.1-1.4% all-cause MR was shown with perfect adherence annual/biennial FIT and single FS/colonoscopy from 50 to 79 years [20]. In the Dutch study, single FS reduced 0.62% of all-cause mortality with 73% adherence [29] (Supplementary Materials, Table S14).

Discussion
By systematic selection of RCTs and multiple-use simulation models on all-cause and CRC-specific MR of CRC screening, 10 RCTs and 47 model studies, including 9 simulation models, were retrieved. Our pooled results show that biennial FIT, gFOBT, single/5-yearly FS, and 10-yearly colonoscopy screenings reduced CRC-specific mortality significantly, and 10-yearly colonoscopy is the most effective with a mortality reduction of 73%. Approximately 1% all-cause MR was presented in FIT, gFOBT, and FS scenarios with high adherence or adjusted compliance. Adherence is a crucial factor on the effectiveness of CRC screening, and higher adherence leads to more significant MR. In model studies, younger screening initiation ages were associated with higher MR than older ages.
Although RCTs should be considered as golden standard for evaluation of benefits and cost-effectiveness of screening strategies, trials request large amounts of time and medical resources, and not all potential scenarios can be evaluated. Simulation models eliminate these disadvantages and can assess multiple scenarios [47]. However, our main

Discussion
By systematic selection of RCTs and multiple-use simulation models on all-cause and CRC-specific MR of CRC screening, 10 RCTs and 47 model studies, including 9 simulation models, were retrieved. Our pooled results show that biennial FIT, gFOBT, single/5-yearly FS, and 10-yearly colonoscopy screenings reduced CRC-specific mortality significantly, and 10-yearly colonoscopy is the most effective with a mortality reduction of 73%. Approximately 1% all-cause MR was presented in FIT, gFOBT, and FS scenarios with high adherence or adjusted compliance. Adherence is a crucial factor on the effectiveness of CRC screening, and higher adherence leads to more significant MR. In model studies, younger screening initiation ages were associated with higher MR than older ages.
Although RCTs should be considered as golden standard for evaluation of benefits and cost-effectiveness of screening strategies, trials request large amounts of time and medical resources, and not all potential scenarios can be evaluated. Simulation models eliminate these disadvantages and can assess multiple scenarios [47]. However, our main finding was that the pooled MRs in model studies tended to be higher than in RCTs. An explanation for this overestimation might be uncertainties in model parameters and in assumptions on CRC progression, and most model studies assume ideal parameters and lack external validation [47]. Another possible explanation might be that model studies applied a perfect adherence with lifetime follow-up, which resulted in more than realistic MR [13,77]. Many individuals in screening ages are not screened properly in the real world [77,78].
Our pooled estimates of RCTs showed CRC-specific MR of 12% in gFOBT and 24% in single FS screenings compared to control group. Previous systematic review and metaanalyses reported 12% and 18% MR with gFOBT screening, respectively [79,80]. Others reported 26-28% CRC-specific MR with single FS screening [15,16,80]. The complianceadjusted analysis showed that FS screening decreased CRC-specific mortality by 41%, which is in accordance with Brenner et al. [15]. Regarding all-cause MR, prior study indicated that single FS and gFOBT screening had little or no reduction in all-cause mortality compared [80]. However, gFOBT screening slightly reduced all-cause mortality by 1% in our results, possibly because only screening participants in intervention group were included, which amplified the effect of screening.
All included scenarios decreased CRC-specific mortality significantly in model studies. Several reviews of model studies concluded that all CRC screening strategies were more effective than no screening [81,82]. In addition, in our pooled estimates, CRC-specific MR of 10-yearly colonoscopy was the highest among all scenarios, while MR of 10-yearly colonoscopy with realistic adherence was not dominant. In general, the adherence of FIT or gFOBT is higher than that of colonoscopy [83]. Thus, although colonoscopy is convinced to have a strong capability in CRC and adenoma screening, the conclusion that the dominance of colonoscopy scenario is not absolute in reality considering the adherence [83,84]. This is consistent with the result of Zhong et al. [83]. Another interesting finding was that biennial gFOBT showed higher MR than biennial FIT. However, gFOBT has lower sensitivity and specificity for CRC than FIT, which results in lower MR than other fecal-based scenarios [31,85,86]. This may be explained by the wider screening age range used with gFOBT, and that all except one study used perfect adherence and lifetime follow-up.
Another finding was that adherence and screening initiation age are crucial factors on effectiveness of CRC screening. Most models included aimed to compare the costeffectiveness of screening scenarios under optimal conditions. For that, a 100% compliances was assumed. However, in daily screening practice, only part of the invited population will attend CRC screening, which will reduce the screening efficiency [87]. Reported estimates for CRC screening adherence are over 60% in high-income countries in Europe, and generally less than 40% in Eastern European countries [88]. Therefore, the use of the real adherence estimates in simulation models will show more realistic values for the evaluation of screening scenarios. Additionally, the WHO stated that a high adherence is the critical factor for a successful screening program implementation [88,89]. Prior studies indicated that several measures contribute to the improvement in adherence, including telephone contact with a navigator, narrative invitation letters, and an approach in which the awareness of CRC and of purpose of CRC screening is strengthened by using an enhanced procedural informational brochure [90][91][92][93]. For screening initiation age, a prior study also revealed that the effectiveness of CRC screening was influenced [73]. American Cancer Society recommends that starting CRC screening at age 45 instead of 50 leads to more favorable cost-effectiveness [31,73]. Our finding also suggested that younger screening initiation ages are correlated with higher CRC-specific MR. Because we did not consider screening costs, we can, however, not conclude early initiation ages are dominant scenarios.
An explanation for the publication bias in model studies is that screening techniques are sensitive to early cancers and precursor lesions, which could be detected and treated at early stage [4,8]. The majority of results are positive due to early diagnosis, which does not introduce bias into our results.

Strengths and Limitations
This study combined results reported in the latest English publication of CRC screening RCTs worldwide, which is the best representative of CRC screening effectiveness. This is also the first study that pooled the benefits of CRC screening in model studies and compared the outcomes with RCTs. Additionally, this study reviewed the effects of CRC screening on all-cause MR. There are some limitations in our study. First, populations in both RCT and model studies were only from Europe, the United States and Australia, so generalization of findings to other parts of the world should be carried out with caution. Second, this study did not include cost, detection rate, and false positive rates, which need to be considered when evaluating optimal screening scenarios and should therefore be added to future research. Third, the assumptions and parameters of the simulation models differed, which leads to a variability in results. There were relatively more publications on CRC-SPIN, SimCRC, and MISCAN-Colon, which are the three main models recommended by CISNET. Thus, the effects of these three models might have a greater impact on the pooled estimates compared to the other models. This resulted inevitably in a quasi-publication bias. Fourth, this study focused on commonly used scenarios. Further systematic reviews and metaanalyses focusing on scenarios with other ages, intervals, and novel screening techniques are necessary to expand the scope of screening effectiveness assessment. Fifth, only perfect adherence and widely used screening ages were considered in the model studies. Screening interval, age of screening initiation, and adherence, which might influence screening effectiveness, were included in scenario construction or sensitivity analyses in some model studies. However, there are no studies that explored MR as a function of different screening scenarios and adherence.

Conclusions
Our systematic review and meta-analysis provides a summary of the latest RCT and model studies of CRC screening on all-cause and CRC-specific MR. Commonly adopted global and Dutch screening scenarios could decrease CRC-specific mortality significantly, and 10-yearly colonoscopy screening is likely to be the most effective. Compliance-adjusted outcome with gFOBT in RCTs showed 1% of all-cause MR, and 0.62-1.1% reductions were shown in model studies with single FS screenings. Our findings suggest that adherence is an important factor in CRC-specific mortality and is an explanation for discrepancy in pooled estimates. Therefore, increased CRC screening adherence improves screening effectiveness. In model studies real-life adherence data should be used, and external validation should be performed for realistic outcomes. Lower screening initiation ages reduces CRC mortality.  [13,[20][21][22][23][24][25][28][29][30][31][32][33][34][35][36][37][38][39][40][41][42][43][47][48][49][50][51][52][53][54][55][56][57][58]60,[62][63][64]70,74,75,94,95], Table S1: Literature Search Strategy; Box S1: Selection criteria for articles; Box S2: Qualitative assessment framework on model characteristics; Table S2: Subgroup analysis of RCTs (Intention-totreat analyses); Table S3: Subgroup analysis of RCTs on CRC-specific mortality reduction by FOBT (Compliance-adjusted analysis); Table S4: Quality assessment of included models; Table S5: Characteristics of Simulation models on biennial gFOBT Screening from the age of 45 to 80; Table S6: Characteristics of Simulation models on Single FS Screening from the age of 50 to 75; Table S7: Characteristics of Simulation models on 10-yearly colonoscopy Screening from the age of 55 to 75; Table S8: Characteristics of Simulation models on 5-yearly flexible Sigmoidoscopy (FS) Screening from the age of 55 to 75; Table S9: Egger's Test for publication bias of Simulation modelling Studies on CRC-specific mortality reduction; Table S10: Meta regression of Simulation models on CRC-specific mortality reduction by 10-yearly colonoscopy Screening from the age of 55 to 75; Table S11: Subgroup analysis of Simulation models on CRC-specific mortality reduction by 10-yearly colonoscopy Screening from the age of 55 to 75; Table S12: Subgroup analysis of Simulation models on CRC-specific mortality reduction by biennial gFOBT Screening from the age of 45 to 80; Table S13: Subgroup analysis of Simulation models on CRC-specific mortality reduction by Single FS Screening from the age of 50 to 75; Table S14: Characteristics of Simulation models on all-cause mortality reduction of CRC Screening; Figure S1: Quality assessment of included RCTs articles; Figure S2: Funnel plots for assessing publication bias (Intention-to-treat analyses). a. All-cause mortality rate ratio on FOBT Screening programs; b. CRC-specific mortality rate ratio on FOBT Screening programs; c. All-cause mortality rate ratio on FS Screening programs; and d. CRC-specific mortality rate ratio on FS Screening programs; Figure S3: Forest plots of the compliance-adjusted analysis. a. All-cause mortality rate ratio on FOBT Screening programs; b. CRC-specific mortality rate ratio on FOBT Screening programs; c. All-cause mortality rate ratio on FS Screening programs; d. CRC-specific mortality rate ratio on FS Screening programs; Figure S4: Forest plots of the CRC-specific mortality rate ratio on 10-yearly colonoscopy Screening from the age of 55 to 75; Figure S5: Forest plots of the CRC-specific mortality rate ratio on 5-yearly FS Screening from the age of 55 to 75; Figure S6: Forest plots of the CRC-specific mortality rate ratio on biennial FIT Screening. a. from the age of 50 to 75; b. from the age of 45 to 75; Figure S7: Forest plots of the CRC-specific mortality rate ratio on 10-yearly colonoscopy Screening. a. from the age of 50 to 75; b. from the age of 45 to 75; Figure S8: Forest plots of the CRC-specific mortality rate ratio on 5-yearly FS Screening. a. from the age of 50 to 75; and b. from the age of 45 to 75.

Acknowledgments:
The authors acknowledge the information specialist (Sjoukje van der Werf) of Central Medical Library (CMB) in University Medical Centre Groningen for helping develop the search strategies of this review. The author, Senshuang Zheng, received support from Chinese scholarship council (CSC) for her research. The council had no role in the study design, data analysis, interpretation, or reporting of the results.